Chronic Lung Disease as a Risk Factor for Long COVID in Patients Diagnosed With Coronavirus Disease 2019: A Retrospective Cohort Study

Abstract Background Patients with coronavirus disease 2019 (COVID-19) often experience persistent symptoms, known as postacute sequelae of COVID-19 or long COVID, after severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Chronic lung disease (CLD) has been identified in small-scale studies as a potential risk factor for long COVID. Methods This large-scale retrospective cohort study using the National COVID Cohort Collaborative data evaluated the link between CLD and long COVID over 6 months after acute SARS-CoV-2 infection. We included adults (aged ≥18 years) who tested positive for SARS-CoV-2 during any of 3 SARS-CoV-2 variant periods and used logistic regression to determine the association, considering a comprehensive list of potential confounding factors, including demographics, comorbidities, socioeconomic conditions, geographical influences, and medication. Results Of 1 206 021 patients, 1.2% were diagnosed with long COVID. A significant association was found between preexisting CLD and long COVID (adjusted odds ratio [aOR], 1.36). Preexisting obesity and depression were also associated with increased long COVID risk (aOR, 1.32 for obesity and 1.29 for depression) as well as demographic factors including female sex (aOR, 1.09) and older age (aOR, 1.79 for age group 40–65 [vs 18–39] years and 1.56 for >65 [vs 18–39] years). Conclusions CLD is associated with higher odds of developing long COVID within 6 months after acute SARS-CoV-2 infection. These data have implications for identifying high-risk patients and developing interventions for long COVID in patients with CLD.

Coronavirus disease 2019 (COVID- 19), caused by viral infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has affected millions of patients worldwide.In the United States (US), there have been >100 million confirmed cases of COVID-19, with >1 million deaths reported between January 2020 and March 2023 [1].Some survivors present with persistent neurological, respiratory, or cardiovascular symptoms after the acute phase of the infection, regardless of the initial disease severity [2].The Centers for Disease Control and Prevention (CDC) describes these long-term effects of SARS-CoV-2 infection as post-COVID conditions (PCCs), postacute sequelae of COVID-19 (PASC), or long COVID [3].Per the CDC, long COVID consists of a wide range of new, returning, or ongoing health problems occurring ≥4 weeks after first being infected with SARS-CoV-2 [3].A meta-analysis showed that long COVID is associated with poor quality of life [4].A cross-sectional study conducted in the US in 2021-2022 found that 13.9% of US adults with a prior positive COVID-19 test reported continued symptoms >2 months after acute illness, representing 1.7% of US adults [5].Based on data collected by the US Census Bureau and CDC in June 2022, 19% of adults who have had COVID-19 have persistent symptoms of long COVID, and 7.5% of adults in the US have symptoms that have lasted >3 months after first infection [6].An estimated 2 million people in the United Kingdom self-reported long COVID symptoms as of 5 March 2023 [7].Data from the Survey of Health, Ageing and Retirement in Europe's corona surveys (2020-2021) indicated a prevalence of 71.6% for long COVID with an average of 3 symptoms among the respondents aged ≥50 years in 27 countries across Europe [8].Given the substantial prevalence of long COVID, subsequent compromised quality of life, and the stress exerted on the healthcare system, it is warranted to identify the risk factors for long COVID and vulnerable populations.
Small-scale studies have been conducted to identify risk factors for long COVID.Age above 40-50 years, female sex, belonging to an ethnic minority group, smoking, obesity, higher number of preexisting medical conditions, and socioeconomic deprivation have shown an association with greater risk of long COVID, while completion of primary vaccination series prior to infection shows protective effects [5,6,9].Previous studies based on regional COVID-19 patients suggest that preexisting lung disease, such as chronic obstructive pulmonary disease (COPD) and asthma, are risk factors for long COVID [7,[9][10][11][12][13].A cross-sectional study that focused on adults hospitalized with a diagnosis of acute COVID-19 during the first 2 waves in the United Kingdom found that preexisting lung disease was associated with a greater number of long COVID symptoms [7].A cohort study in Italy demonstrated a significant association between obstructive lung disease and long COVID among 739 nonhospitalized healthcare workers confirmed COVID-19 positive [13].These studies were based on select study populations and conducted within different time frames.Many of them relied on a limited sample size and focused on a narrower timeframe of SARS-CoV-2 infection.The lack of a shared and specific definition of long COVID, and limited confounding factors included in the analyses, also suggests the need for further evaluation of the association between preexisting chronic lung diseases (CLDs) and the occurrence of long COVID.More recently, a retrospective case-control study based on the National COVID Cohort Collaborative (N3C) established a machine learning-based PASC diagnosis with CLD as one of the predictors in the model [14].Large, randomized controlled trials, such as the Longitudinal Study of COVID-19 Sequelae and Immunity (RECON-19) [15] and Researching COVID to Enhance Recovery (RECOVER) [16] studies, have focused on cohort characterization and symptom clustering.However, systematic and large-scale studies evaluating associations between CLD and long COVID after acute SARS-CoV-2 infection are still limited.
This study aimed to investigate the impact of preexisting CLD on the clinical diagnosis of long COVID and assess the risk factors for developing long COVID among adult patients with CLD using real-world data from a large, nationally sampled cohort.Based on the reviewed literature, we hypothesized that adult patients with preexisting CLD will exhibit a higher risk of long COVID compared to those without.

Study Design
The analytical data were sourced from the N3C Data Enclave, which harmonizes electronic health record data from 75 institutions into a single common data model.The criteria for confirmed SARS-CoV-2 infection (or confirmed COVID-19 positive) were based on the International Classification of Diseases, Tenth Revision (ICD-10) diagnostic code for COVID-19 diagnosis (U07.1) or positive SARS-CoV-2 laboratory result (qualitative reverse-transcription polymerase chain reaction or antigen test).The follow-up period for the cohort was 6 months after the acute phase of infection, which was defined as 4 weeks following the COVID-19 confirmation date.Patients were included if they (1) were ≥18 years of age when confirmed COVID-19 positive and (2) were confirmed COVID-19 positive during any of 3 consecutive SARS-CoV-2 variant dominance periods, defined as pre-Delta wave (29 December 2020-14 June 2021), Delta wave (15 June 2021-29 November 2021), and Omicron wave (30 November 2021-15 June 2022) [17].We excluded patients who (1) were lost to follow-up without a valid long COVID diagnosis reported; (2) who died directly from COVID-19 during acute phase (within the 30-day time window after the acute COVID-19 diagnosis); and (3) who had missing information.

Outcome
Our outcome was the incidence of long COVID diagnosis during the 6-month follow-up period after the acute phase of SARS-CoV-2 infection.The diagnosis of long COVID was denoted by the ICD-10 diagnostic code for long COVID (U09.9).Long COVID diagnoses recorded during the acute phase were considered invalid based on the definition of long COVID by the CDC [3].

Risk Factors
The primary exposure of this study was CLD before COVID-19 diagnosis using a clinician-and informatician-validated computable definition [18].It refers to chronic disorders that affect the lungs and other parts of the respiratory system, including the conditions of COPD, sleep-disordered breathing, and interstitial lung disease [19].The most common symptoms are shortness of breath, coughing, wheezing, and tightness in the chest [19].For this study, we identified a cohort with preexisting CLD before the first acute COVID-19 diagnosis in the N3C Data Enclave (the CLD group).In addition to the primary exposure, we included variables potentially associated with the outcome.Demographic characteristics were captured using age, sex, and race/ethnicity.Social history factors included tobacco use and substance abuse.Vaccination status was categorized into 3 groups: (1) incomplete vaccination (including patients who were unvaccinated or failed to complete the primary vaccination series, the reference); (2) completed primary series (including patients who completed the primary vaccination series but not boosted); and (3) boosted (including patients who received 1 or more boosters).We also accounted for the effects of medication history before the first SARS-CoV-2 infection and medications for acute COVID-19 treatment (see details in Supplementary Table 1), and all of the variables on medication use were dichotomous.The severity of infection was included as a 4-level ordinal variable in the analysis as well, ranging from "mild with no emergency department visit or hospitalization" to "use of intermittent mandatory ventilation or extracorporeal membrane oxygenation during COVID hospitalization."Immune status after recovery was included using SARS-CoV-2 antibody status after infection; repeated SARS-CoV-2 infection following the initial infection was also considered as a potential risk factor for long COVID.Medical conditions were assessed with comorbidities including cardiovascular diseases, infectious diseases, neurological and cognitive disorders, and malignant tumors.We also adjusted for the effects of variant dominance periods and healthcare access based on geographic areas.All of the covariates mentioned above are categorical variables and were selected based on a priori knowledge of risk factors for long COVID [5,[9][10][11][20][21][22].In short, the confounders we adjusted for in our analysis include age, sex, race/ethnicity, tobacco usage, substance use disorder, COVID-19 vaccination status, medications, the severity of infection, SARS-CoV-2 antibody status after infection, reinfection, comorbidities, variant waves, and urban/rural residency.The summary statistics of all adjusted variables are provided in Supplementary Table 1.

Statistical Analysis
We used descriptive statistics to show the distribution of demographic features, social history, health status, and medical conditions among the cohort patients and compared the distributions of these covariates between the cohort patients and those lost to follow-up (Supplementary Tables 1 and 2).We also used marginalized and conditional probabilities to demonstrate the incidence of long COVID and its distribution among stratified patient populations (Supplementary Table 3).
Primary Analyses.We assessed the association between CLD and long COVID with a multiple logistic regression model that adjusted for all the covariates, including demographic variables, social factors, immune status, medication use, and comorbidities.We computed adjusted odds ratios (aORs) with accompanying 95% confidence intervals (CIs) to evaluate the effects of underlying CLD on the incidence of long COVID.We also report aORs and 95% CIs for other risk factors.
Subgroup Analysis.We assessed the risk factors of long COVID among the patients with preexisting CLD, using multiple logistic regression, and report the aOR and 95% CI.
Sensitivity Analysis.Some of the patients with long COVID diagnoses in our cohort had a follow-up period of <6 months.Since we were unable to discern the reasons for the loss of follow-up from N3C and this heterogeneity might potentially cause bias, we removed these patients from the cohort and repeated the proposed primary analysis to check how it would influence our results.In addition, long COVID is potentially underreported in the N3C data.We randomly removed the diagnosis of a proportion of the confirmed long COVID cases from the cohort to simulate an underdiagnosis scenario and repeated the primary analysis to understand the robustness against the underreporting issue of our analysis.Death is a competing risk of long COVID, which potentially led to bias in the observation time for long COVID.We performed a sensitivity analysis by repeating the primary analysis on the cohort after removing all patients with post-acute COVID-19 death.

Study Population
We extracted data from N3C on 4 March 2023.A total of 3 222 patients who were confirmed COVID-19 positive during the time period from 29 December 2020 to 15 June were reported to N3C.Of these, 2 930 405 patients were adults (≥18 years of age) at COVID-19 diagnosis.A total of 2 893 patients survived the acute phase of infection.Among these survivors, 1 687 225 patients were lost to follow-up within the first 6 months without a long COVID diagnosis.We finally included 1 206 021 patients in the analysis.We divided the patient cohort into 2 groups (CLD group vs non-CLD group) based on the presence of underlying CLD (Figure 1).

Descriptive Data
Nearly half (46%) of the patients in the cohort were aged 40-65 years.Almost two-thirds (62%) of the patients were female.Comparing the subcohort with preexisting CLD (CLD group) and the subcohort without preexisting CLD (non-CLD group), the CLD group had a higher percentage of elderly (>65 years old), female, and non-Hispanic Black patients.The number of confirmed COVID-19 cases across 3 consecutive SARS-CoV-2 variant dominance periods was comparable (pre-Delta vs Delta vs Omicron: 34% vs 29% vs 37%).Among the cohort, 17.1% of the patients had preexisting CLD.Baseline patient characteristics distributions are shown in Figure 2. Complete baseline patient characteristics are summarized in Supplementary Table 1.The patients lost to follow-up had similar baseline patient characteristics distributions as our study cohort, indicating the loss to follow-up mechanism being random.Therefore, there is very minor concern about biased results due to the exclusion of this cohort.Details are shown in Supplementary Table 2.
In the cohort, 1.2% of the patients were diagnosed with long COVID according to the ICD-10 diagnostic code for long COVID (U09.9) during the follow-up period.Patients in the CLD group had a higher incidence of being diagnosed with long COVID compared to those in the non-CLD group (2.1% vs 1.1%).Each type of CLD, including COPD, asthma, and bronchiectasis, was related to higher incidence of long COVID (COPD: 2.1% vs 1.2%; asthma: 2.1% vs 1.1%; bronchiectasis: 3.2% vs 1.2%).The incidence of long COVID in the older age group was higher compared to the younger age group (>65 years vs 40-65 years vs 18-39 years: 1.5% vs 1.5% vs 0.7%).COVID-19 cases during Delta and Omicron wave showed higher incidences for long COVID compared to pre-Delta wave (pre-Delta vs Delta vs Omicron: 0.3% vs 1.7% vs 1.7%).Details of long COVID incidences are shown in Supplementary Table 3.

Primary Analysis
From the multiple logistic regression adjusted for demographic, socioeconomic, and medical factors, CLD was significantly associated with long COVID.After adjustments, patients with CLD had 36% higher odds of long COVID than those without (adjusted odds ratio [aOR], 1.36 [95% CI, 1.3-1.41]).Among all the other covariates included in the model, several factors also significantly influenced the odds of long COVID.Female sex, older age, living in an urban area, infection during Delta or Omicron waves, higher level of COVID-19 severity during the acute COVID-19 phase, reinfection, and positive antibody after infection were associated with higher odds of long COVID.Compared to non-Hispanic Whites, the odds  of getting diagnosed with long COVID were higher in Hispanic patients (aOR, 1.07 [95% CI, 1.01-1.13])and lower in non-Hispanic Blacks (aOR, 0.87 [95% CI, .82-.91]).Some preexisting diseases-including rheumatological disease, obesity, depression, and malnutrition-were also associated with higher odds of long COVID.Diseases associated with lower odds of long COVID included psychosis, diabetes, dementia, heart failure, cancer, hemiplegia or paraplegia, human immunodeficiency virus (HIV) infection, hypertension, and solid organ transplant.Some medications taken before the infection or administered during COVID-19 hospitalization were associated with higher odds, including systemic corticosteroids, remdesivir, immunosuppressants, and therapeutic agents for COVID-19.The anticoagulant and vasopressor were associated with a lower risk of long COVID.History of tobacco use and history of substance abuse were associated with lower odds of developing long COVID after survival from acute COVID-19 (aOR, 0.84 [95% CI, .78-.90] and 0.7 [95% CI, .64-.77], respectively).Full vaccination or having a booster were related to lower odds of long COVID (aOR, 0.94 [95% CI, .90-.98] and 0.94 [95% CI, .90-.98], respectively).Results on demographic, socioeconomic, and geographical covariates are presented in Figure 3.All results of the primary analysis are summarized in Supplementary Table 4.

Subgroup Analysis
Among patients with CLD, similar to the analysis on the whole cohort, female sex, older age, living in an urban area, infection during Delta or Omicron wave, higher level of severity, reinfection, and positive antibody status were related to higher odds, and tobacco use and substance abuse disorder to lower odds, of long COVID diagnosis.Rheumatologic disease, depression, obesity, and malnutrition were associated with high odds of long COVID.However, vaccination status (completion of primary series or receipt of 1 or more booster doses) did not show a significant effect on long COVID (aOR, 1.04 [95% CI, .96-1.13] and 0.99 [95% CI, .92-1.07], respectively).For patients without CLD before COVID-19, both full vaccination and boosted status were related to a lower risk for long COVID compared to not fully vaccinated (aOR, 0.91 [95% CI, .86-.96] and 0.92 [95% CI, .87-.97], respectively; Supplementary Tables 5 and 6).

Sensitivity Analysis
First, the follow-up period was <6 months for 10 215 patients in our cohort who were diagnosed with long COVID within the 6-month period.After removing these patients, repeat analyses on the remaining 1 195 806 patients with the same followup demonstrated similar risk factors for long COVID (Supplementary Table 7).Second, regarding the concern on underreported long COVID cases, after randomly removing 50% of the confirmed long COVID cases from the cohort, analyses in this underdiagnosis scenario also presented similar results (Supplementary Table 8).Third, sensitivity analysis on the potential competing risk impact of death also showed similar results to the primary analysis (Supplementary Table 9).Overall, our primary analysis results were robust.

DISCUSSION
In this retrospective cohort study of adult patients diagnosed with COVID-19 who survived to follow-up, we evaluated the association between preexisting CLD and long COVID, and identified the risk factors for developing long COVID among adult patients with CLD.
A prediction model based on N3C data identified CLD as one of the explanatory features for the incidence of long COVID [23].A cohort study in the Kingdom of Saudi Arabia showed that preexisting lung disease was a predictor for dyspnea and was associated with higher risk for chronic fatigue between 6 weeks and 6 months after hospital discharge in patients with COVID-19 [24].Our results came to the same conclusion.After adjusting for the confounders, we demonstrated a significant association between preexisting CLD and higher odds of long COVID.We also found that patients were more likely to develop long COVID if they were female, at older ages, or living in an urban area.The effects of sex and age are consistent with previous studies [21,25].The association with living in an urban area could be partially explained by the fact that more air pollution in urban area contributes to lung disease [26].It is also potentially confounded by the fact that people living in urban areas may have more access to healthcare resources and may be more likely to be diagnosed [27].In addition, some risk factors, such as positive antibody after infection, reinfection, rheumatologic disease, depression, obesity, and malnutrition, may be related to highly activated immune cells and elevated expression of cytokines [28][29][30][31][32][33].The higher odds of long COVID could be explained by the immune and inflammatory status under these conditions.Our findings on COVID-19 waves differ from a Spanish retrospective study (N = 325) and a Brazilian case-control study (N = 7051), which identified infection during Delta and Omicron waves as protective factors against long COVID [34,35].This discrepancy highlights the importance of conducting the research in different patient populations to investigate the effects of COVID-19 waves.
This study also finds some protective factors against long COVID, such as HIV infection, tobacco use, substance abuse, and vaccination.Existing evidence suggests that immunosuppression could help tamper the cytokine storm or other aspects of the inflammatory response and thus HIV, characterized by immunodeficiency, is shown to have protective effects against long COVID [36].In terms of tobacco use, studies have shown that smokers are less likely to seek healthcare help for respiratory symptoms and long COVID might be underdiagnosed in this population [37,38].Similarly, help-seeking behavior is less frequent in people with substance abuse [39].This may account for the counterintuitive results that tobacco use and substance abuse were associated with lower odds for long COVID diagnosis.Our results concur with a systematic review and metaanalysis showing that vaccines have a beneficial effect on reducing the risk of long COVID [40].However, the effects of vaccines were not statistically significant among those with CLD.The adverse impacts of lung disease potentially offset the benefits of vaccination in the CLD subgroup.
Our study is novel in assessing long COVID in a large patient cohort over 3 consecutive COVID-19 waves.Also, we used long COVID diagnosis based on ICD-10 codes for long COVID (U09.9) to evaluate the outcome, instead of based on persistent PCCs.Use of the ICD-10 code was a unified way to define the response variable and made it easier to compare the results across studies using the same criteria for the response variable.Another notable advantage of our approach was the comprehensive integration of covariates associated with socioeconomic conditions, geographical influences, and preexisting or concurrent medication use during the COVID-19 pandemic.This not only allowed us to further mitigate potential biases but also facilitated exploration of potential associations with factors such as healthcare accessibility, predominant SARS-CoV-2 variants, vaccination status, and disease severity.These aspects contributed to the robustness and depth of our model.
Our study on long COVID has some limitations.There is potential for underreporting of the long COVID diagnosis and some of the chronic conditions in the dataset.Only 1.2% of our cohort had a long COVID diagnosis on record during the follow-up, a much lower proportion compared to that in other studies.A survey in Italy showed that 28.9% of COVID-19 patients reported persistent symptoms >4 weeks from the onset of infection [41].The Census Bureau's Household Pulse Survey showed that the prevalence of long COVID decreased to 6% among US adults in June 2023 [42].Unawareness of long COVID and few show-ups at clinic for persistent symptoms in the early stage of the pandemic account for the underestimated long COVID cases in our study.Nevertheless, sensitivity analysis supports the robustness of the results to the underreporting issue.In addition, the independent and dependent variables are constructed based on the concept sets (ICD-10 codes, etc) that are dependent on clinician and data coder inputs; data from different sites are potentially heterogeneous.There was no effort as part of this project in verifying these data at the site level as the N3C provides a limited data set with no potential reidentification of patients and no potential for chart review.We suggest improving the dataset by establishing clear definitions for certain variables and exploring interactions between diseases and medication history to improve accuracy.Despite these limitations, our study provides valuable insights into potential risk factors associated with long COVID.

CONCLUSIONS
Preexisting CLD before COVID-19 is a risk factor for long COVID within 6 months after the acute phase of illness.Patients with CLD should be closely monitored following acute COVID-19, in order to identify long COVID in a timely manner and offer treatment.Prevention of reinfection, nutrition support, and chronic disease management may reduce the risk.

Figure 1 .
Figure 1.Flowchart detailing the inclusion and exclusion criteria applied in the construction of the analytical cohort from the National COVID Cohort Collaborative (N3C) observational electronic health record data.Abbreviation: COVID-19, coronavirus disease 2019.